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Existing reading assessments have increasingly been criticized by researchers, educators, 
and policy makers, especially regarding their coverage, utility, and authenticity (e.g., Magliano, 
Millis, Ozuru, & McNamara, 2007; Pellegrino, Chudowsky, & Glaser, 2001; Rupp, Ferne, & 
Choi, 2006). Specifically, there is concern that current assessments often: are poorly aligned with 
contemporary theoretical constructs and empirical findings pertaining to reading processes and 
development; are insufficiently sensitive for detecting changes in the kinds of skills that are 
targeted by interventions; use mainly multiple-choice formats that emphasize strategic reasoning 
rather than understanding of the text; provide little diagnostic information for guiding 
instruction; and measure comprehension using tasks and texts that do not represent the full range 
of the purposeful literacy activities of 21‘ Century reading. 

In view of the foregoing criticisms, it is clear that new reading assessments are needed 
and desired by educators and researchers. These new assessments should draw upon the lessons 
that have been learned from prior experiences in both classroom and laboratory, over the past 
several decades. Our aim is to integrate and extend strengths of past approaches in an innovative 
way, yet adhere to rigorous design principles that will ensure feasible implementation, good 
construct coverage, and strong psychometric properties. The instruments’ validity and 
educational utility will also be enhanced, if based on contemporary theory and research on 
reading, learning, and instruction. 

A Conceptual Framework for Assessment of Reading for Understanding (RFU) 

The conceptual framework for the design of reading comprehension assessments we 
propose is a distillation of ideas and evidence, drawn from and integrated over several 
longstanding theories (Sabatini, Albro, & O’Reilly, 2012; Sabatini, O’Reilly, &Albro 2012). At 


its foundation are six principles, which are consistent with most contemporary models of 


reading, acquisition, and reading disabilities. Associated with each principle are key design 
implications for assessments. These serve as a guide for the development of 21*' Century reading 
assessments and an accompanying research agenda that can test the validity and utility of 
particular constructs, design approaches to operationalize them, and procedures for 
implementing, scoring, and communicating results. 

There is no extant theory for which empirical support has been collected across the 
lifespan to test a unified model that integrates all facets of reading. Collectively, however, 
research and theories have been proposed and tested on a smaller scale in numerous studies. 
Admittedly, the empirical work is uneven developmentally (i.e., stronger on componential 
theories of reading in the K-3 range, stronger on mental representation models of understanding 
at the upper grades) and there remain healthy debates about which specific theories and models 
best fit the data. Nonetheless, there is sufficient agreement among theories to identify some basic 
principles in common, and there is ample empirical support for those principles. The framework 
represents our best synthesis of reading research as it relates to assessment constructs and design. 

Principle 1: Print skills and linguistic comprehension are each necessary components 
of reading proficiency, though neither individually is sufficient to ensure proficiency. This 
principle captures the essence of the simple view (Gough & Tunmer, 1986; Hoover & Gough, 
1990), in which reading comprehension is viewed as the product of two necessary but 
nonsufficient abilities: word recognition and linguistic comprehension. In the 20 years since its 
introduction, this model has been well supported in numerous studies (e.g., most recently, Adlof, 
Catts, & Little, 2006; Johnston & Kirby, 2006; Vellutino, Tunmer, & Jaccard, 2007). There is 
also extensive evidence for the validity and utility of word reading and text comprehension 


measures for younger students and for older readers who lack mastery of basic print and 


language skills (e.g., Deno, Fuchs, Marston, & Shin, 2001; Perfetti, Landi, & Oakhill, 2005; 
Sabatini, 2002; Sabatini, Sawaki, Shore, & Scarborough, 2010). 

Inefficient print processing (at both the word and text levels) can also detract from 
reading proficiency, particularly as texts and tasks become longer and more complex. Across 
development, print processing skills gradually become more efficient, fluent, and automatized, 
allowing most cognitive resources to be applied to higher order processes that are necessary for 
comprehension. With regard to understanding, when basic processes (e.g., decoding, word 
recognition) are not automatized, they require conscious effort, and draw the reader’s attention 
away from higher level comprehension processes (Cain, Oakhill, & Bryant, 2004; LaBerge & 
Samuels, 1974; Perfetti, 1985). Indeed, it is well demonstrated that reading proficiency correlates 
well with the fluency of word- and text-reading (Daane, Campbell, Grigg, Goodman, & Oranje, 
2005; Wayman, Wallace, Wiley, Ticha, & Espin, 2007). 

Furthermore, because unsuccessful reading comprehension can arise from word 
recognition limitations in young students and in struggling older readers, assessing 
comprehension proficiency solely in the print modality may underestimate the competence of 
these students. As shown in differential boost studies (e.g., Cahalan-Laitusis, Cook, Cline, King, 
& Sabatini, 2008; Fletcher, Denton, & Francis, 2005), these students can often demonstrate 
stronger comprehension skills when provided accommodations for their weak print skills. 

Compelling evidence is not available to identify the set of subskills that are necessary and 
useful to measure for summative purposes. For instance, there are many reliable and valid 
instruments for assessing phonological decoding, word recognition, word-reading efficiency, and 
text-reading fluency. However, it is not clear which of these sub-constructs (at what grade or 


proficiency levels) adds value in assessment. Furthermore, the ‘linguistic comprehension’ 


construct is rather vaguely defined in the literature and has been operationalized in various ways 
in research (e.g., Hagtvet, 2003; Keenan, Betjemann, & Olson, 2008), but consistently 
demonstrates strong predictive power at most ages (Aouad & Savage, 2009; Catts, Adlof, & 
Weismer, 2006; Catts, Hogan, & Fey, 2003). 

Design Implications: Reading comprehension difficulties can arise from weaknesses in 
either print processing or linguistic comprehension. In the lower grades, both skills should be 
measured directly. For nonproficient readers in higher grades, continuing the direct assessment 
of components can add value to summative assessment claims and interpretations. 

Principle 2: Both breadth and depth of vocabulary knowledge are essential for 
understanding. As readers mature, their vocabulary knowledge typically keeps pace with 
increases in world knowledge, and increases largely by the reader inferring the meanings of 
unfamiliar words from context. Vocabulary knowledge is important for reading comprehension, 
and strong correlations (r = .6 to .7) between them are typically observed (Anderson & Freebody, 
1981; Daneman, 1988; Hirsch, 2003). This association is evident from the start of schooling and 
strengthens thereafter (Stanovich, West, & Harrison, 1995). Understanding of a written text is 
likely to be thwarted if the readers inaccurately recognize the meanings of just 5 to 10% or more 
of the words in that text (Nagy & Scott, 2000). The reader’s depth as well as breadth of word 
knowledge is important because many words have multiple meanings, or idiomatic usages 
(Ouellet, 2006). 

Given its strong link with comprehension, vocabulary is often measured in reading 
assessments. Many tests employ multiple choice items to evaluate knowledge of synonyms and 
definitions, and vocabulary items can also be embedded in continuous text passages to examine 


contextual effects (e.g., Sheehan, Kostin, & Persky, 2006). Oral language measures of receptive 


and expressive vocabulary may include picture naming, picture matching, definitions, and 
synonym production, among others. Although there is evidence linking reading skills to the use 
of morphological structure to infer word meaning (Carlisle & Stone, 2003; Kieffer & Lesaux, 
2007), assessment of morphology is rarely integrated into reading assessment. 

Design Implications: It is important to determine what drives weaknesses in vocabulary: 
lack of supporting knowledge and vocabulary in specific domains, inability to make context- 
based inferences about new words, or limited general vocabulary. Hence, comprehension 
assessments must measure lexical knowledge both within and out of context. 

Principle 3: Readers construct mental models of text meaning at multiple levels, from 
literal to gist to complex situation models. These models reflect and depend on the reader’s 
aims and prior knowledge. Contemporary views of reading comprehension emphasize the 
importance of differing levels or depths of understanding. To exemplify this principle, we rely on 
the construction integration (“CI’) model (Kintsch, 1988, 1998), which posits three levels of 
understanding that vary in their stability and depth. The surface level is a verbatim representation 
of the literal words, phrases, and structures of the text. It is typically retained only briefly, during 
which time the reader analyzes semantic and syntactic relationships to build the textbase 
representation or “gist” of the text. At this level, verbatim retention is lost, but critical meaning is 
represented abstractly as the key propositions and relationships that can be inferred among them. 

The intended meaning of a text cannot always be understood from the textbase, because 
important pertinent information may be left out (Beck, McKeown, & Gromoll, 1989). The reader 
must infer meaning based on prior knowledge, resulting in a deeper and more robust level of 


representation, the situation model (McNamara & Kintsch, 1996). Construction of a situation 


model allows a reader to learn new information from text by integrating unfamiliar terms and 
ideas with familiar schemas and knowledge. 

A useful assessment, therefore, should indicate how well a student can construct different 
levels of understanding. It is also valuable to assess the extent to which doing so is constrained 
by limited background knowledge, different reading goals, tasks and instructions. 

Design Implications: Because depth of comprehension varies between and within 
individuals, assessments should consider the knowledge of the reader and distinguish the literal 
surface code, textbase, and complex mental schemas (situation models). 

Principle 4: Reading is ordinarily a purposeful activity, aimed at attaining a coherent 
understanding of a text that is sufficient for the reader’s goal. Successful readers monitor and 
self-regulate their comprehension, enabling the repair of mental models as needed. Strategies 
provide a vehicle for driving deeper levels of processing. One’s purpose for reading can 
influence what is attended to, how it is analyzed, what “standard of coherence” (desired level of 
comprehension) is adopted, and thus how deeply text is comprehended (van den Broek, Risden, 
& Husebye-Hartman, 1995). When a low standard of coherence is chosen, gaps in 
understanding are tolerable to the reader whereas readers who adopt a high standard of coherence 
must expend additional effort to deepen and embellish understanding, to ensure that information 
is integrated into a coherent situation model, and to monitor and repair breaks in understanding. 

When a higher standard of coherence or deeper level of processing is demanded, then the 
reader may call upon reading strategies as a vehicle for how to construct and organize more 
robust models of the text. A reading comprehension strategy is “a cognitive or behavioral action 
that is enacted under particular contextual conditions, with the goal of improving some aspect of 


comprehension” (Graesser, 2007, p 6), such as question asking (e.g., King, 2007), self- 


explanation (e.g., McNamara, O’Reilly, Rowe, Boonthum, & Levinstein, 2007), summarization 
(e.g., Yu, 2003), graphic organizers and tools for making text structure explicit (e.g., Meyer & 
Wijekumar, 2007). Although a reader’s goals are often self-selected, they can be influenced by 
the nature of the task and text. By varying the texts and instructions during assessment, the 
reader should adjust their standard of coherence to match the task demands, and performance 
differences can be evaluated, providing key information for understanding the nature of 
comprehension difficulties for guiding instruction. 

Design Implications: More valid inferences about reading will be obtained by specifying 
goals for reading activities during assessment, because able readers will evaluate the adequacy 
of their mental models in relation to those goals, and reconstruct the models accordingly. 

Principle 5: Skilled reading includes proficiency in evaluating and synthesizing 
information across multiple texts. This requirement is driven by the increasing prevalence of 
digital literacy activities. Reading skills will increasingly be deployed in evolving digital 
environments, and a hybrid of print and digital skills will be essential to proficiency. Even in 
elementary school, students are expected to consult multiple sources when engaging in literacy 
activities involving the internet. However, search engines retrieve enormous numbers of 
documents on many topics. If a report on “the rainforest” is assigned, hits will likely include a 
Wikipedia webpage, numerous blogs, the Rainforest Café site, government policy statements, 
and so forth that vary in content, media (pictures, videos), genre (narrative, argument), and 
intention (entertain, persuade). 

To acquire a deep understanding of multiple documents, the reader must also construct a 
situations model (Perfetti, Rouet, & Britt, 1999) that integrates the information from multiple 


document nodes. A situations model requires a skilled reader to (a) encode source information, 


(b) evaluate relevance and trustworthiness, and (c) determine which propositional content from 
the documents should be emphasized in the situations model or product (e.g., report, poster, etc.). 

The relatively recent advent and widespread societal use of the internet has expanded the 
skill set for literacy; 21*t Century readers must be facile in navigating and utilizing e-print 
environments (Partnership for 21“ Century Skills, 2008). Today’s proficient reader must be able 
to deploy skills of searching, retrieving, understanding, locating, evaluating, interpreting, and 
integrating documents in digital contexts, as well as in print. Compared to print, digital 
environments are likely to provide novel affordances for deploying literacy skills, including: 
email, blogging, text messaging, using search engines, navigating websites and so forth (Coiro, 
2009). These activities do not have exact parallels in the print world and the easily-accessed 
information is largely unfiltered (for quality and credibility) and imposes a heavier burden on the 
reader to understand, evaluate, and interpret the information appropriately and wisely. 

Design Implications: Assessing the understanding just one text at a time does not cover 
the full construct. Evaluating situation models constructed from multiple sources can be used to 
examine students’ evaluation, integration, and synthesis of information. New assessments should 
be designed to be appropriate for evaluating skills in both print and digital environments. 

Principle 6: Growth in reading proficiency consists of incremental expansion of 
knowledge and skills for the understanding of increasingly complex texts and task demands. 
Growth is driven primarily by the quality and quantity of instruction, experience, and practice, 
resulting in substantial variability within and between grade levels, schools, and 
socioeconomic Strata. In our discussion of guiding principles, we have noted there are 
developmental shifts in the relative importance of each principle in accounting for proficiency 


differences in reading, and consequently the implications for assessment. In our view, these 


10 


developmental changes are gradual and incremental. Although differences in reading 
performance between students in 1‘ versus 4" grade, or 6" versus 12" grade, can look 
qualitatively dissimilar, there is no firm evidence for discrete stages; instead, dramatic 
improvements over the longer term result from relatively small continuous increments -- in the 
mastery and automaticity of acquired print skills (Principle 1); in the breadth and depth of oral 
language knowledge and skills (Principles 1, 2); and in the variety and complexity of texts and 
tasks (Principles 3, 4, 5). 

The rate of growth of these aspects of reading for understanding can differ in ways that 
will guide the design of the proposed assessments. Notably, as grade increases, (a) Jess emphasis 
will be placed on examining children’s acquisition of print skills; and (b) more emphasis will be 
placed on assessing the construction of mental models for differing text types, reading aims, and 
multiple media sources. For certain older readers, it might also be useful to measure basic skills. 

Design Implications: Assessments should include tasks and reading materials along a 
developmental continuum of increasing proficiency in all aspects of reading that yield valuable 
information about achievement differences and sources of difficulty in reading comprehension. 

Description of a New Assessment System of Reading for Understanding 

Building on this conceptual foundation, we have been designing a new theoretically- 
based, developmentally sensitive assessment system that consists of two main parts: (1) a set of 
“integrated” comprehension tasks, in which students read for understanding to attain a defined 
aim; and (2) a set of supplementary tests of component skills, for use with nonproficient readers, 
to provide information to identify or rule out potential bases for comprehension difficulties. 

The logic of this approach is that global, integrated reading texts and task performances 


afford multiple cues (e.g., inferential, knowledge) that an individual can exploit to bootstrap 
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performance to compensate for weak individual component subskills (O’Reilly & McNamara, 
2007; Walczyk, Marsiglia, Johns, & Bryan, 2004). For proficient readers, the availability of 
multiple, overlapping sources of information reduces the complexity of processing. For the 
nonproficient reader, it may actually enable performance at an artificially high level that is not 
sustainable as the student encounters more complex texts and tasks in subsequent years. Thus, 
given their utility for predicting potential risk for a decline in achievement, separate tests of 
component skills are warranted because weaknesses in those areas could be masked in an 
integrated assessment. 

A Global, Integrated Summative Assessment (GISA) of Reading for Understanding 

We envision an integrated assessment of reading comprehension that parallels the kinds of 
activities that students typically engage in. These activities begin with a specific purpose or goal, 
and proceed with actions that achieve that goal. The actions include searching for relevant 
information, evaluating its quality and pertinence, synthesizing and integrating it with other 
information, and producing some product that satisfies the goal. As such, reading for 
understanding is not usually a passive activity that involves answering comprehension questions 
on a collection of unrelated passages, but rather a focused and more complex process of 
constructing meaning from text(s) in order to meet task goals. 

We recognize the challenges and pitfalls of previous performance assessments that relied 
on a small set of complex tasks and that yielded limited information per individual (weakening 
reliability and discrimination). In contrast, we envision maintaining a large percentage of 
discrete, objectively scored items, with a smaller mix of constructed response items. We have 


had success in models stemming from other reading comprehension projects to build upon 
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(Bennett, 2011; Deane, Sabatini, & O’Reilly, 2011; O’Reilly & Sheehan, 2009; Sheehan & 
O’Reilly, 2011). 

The GISA will examine the student’s proficiencies in a) constructing different levels of 
representations (textbase, situation, and multiple source) (Kintsch, 1998), b) familiarity with text 
structure and genre differences (Goldman & Rakestraw, 2000), c) deployment of 
executive/metacognitive processes (Schraw, 2000), and d) application of strategies for attaining a 
literacy goal (McCrudden & Schraw, 2007). The goal is to broaden the coverage of the 
assessment, while maintaining standardized testing conditions and adequate measurement. 
Component Skills Assessments 

For assessments to more appropriately cover the full reading construct range, we 
hypothesize that measurement of component skills for nonproficient readers is justified and will 
provide more useful information to guide instructional decision-making than relying solely on 
integrated measures. Separate component skill subsections of a summative assessment targeted 
at less proficient readers, can solve the problem and provide more specific information of 
strengths and weaknesses underlying nonproficient reading. However, on their own, component 
skills do not sum up to reading proficiency. That is, one could hypothetically be proficient in 
each subskill and still fail to adequately integrate them. 

Conclusions & Implications 

Reading or reading comprehension always have been and always will be social 
constructs. Writing systems change, languages evolve, technologies advance, cultures and 
societies shift, and the social value and meaning of literacy follow along. We cannot define or 
legislate what reading comprehension is; but we can observe and describe it in the historical 


moment, use the tools of science to understand and interpret it; and then hopefully help 
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individuals and groups to better acquire and use this valuable social technology of learning, 
communication, personal growth, and societal participation (Venezky, 1990). The rationale and 
assessment system we have described tries to capture key aspects of literacy in this historical 
moment. It tries to be as explicit as current research permits, in parsing the construct into 
elements or principles that we anticipate may interact with individual differences among 
learners. 

The complexities of one’s language, one’s writing system, and the social context of 
literacy practices and uses will interact with individual differences, resulting in relative 
advantages or disadvantages for individuals as they learn to read and become literate. Being 
more explicit about the elements of the construct, will hopefully improve the value and utility of 
resulting assessment scores in informing how best to help individuals with differences to 
demonstrate what they can do and what they struggle with, en route to their acquiring 
proficiency. 
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